MPI on a Million Processors

نویسندگان

Pavan Balaji

Darius Buntinas

David Goodell

William Gropp

Sameer Kumar

Ewing L. Lusk

Rajeev Thakur

Jesper Larsson Träff

چکیده

Petascale machines with close to a million processors will soon be available. Although MPI is the dominant programming model today, some researchers and users wonder (and perhaps even doubt) whether MPI will scale to such large processor counts. In this paper, we examine this issue of how scalable is MPI. We first examine the MPI specification itself and discuss areas with scalability concerns and how they can be overcome. We then investigate issues that an MPI implementation must address to be scalable. We ran some experiments to measure MPI memory consumption at scale on up to 131,072 processes or 80% of the IBM Blue Gene/P system at Argonne National Laboratory. Based on the results, we tuned the MPI implementation to reduce its memory footprint. We also discuss issues in application algorithmic scalability to large process counts and features of MPI that enable the use of other techniques to overcome scalability limitations in applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PSINS: An Open Source Event Tracer and Execution Simulator for MPI Applications

The size of supercomputers in numbers of processors is growing exponentially. Today’s largest supercomputers have upwards of a hundred thousand processors and tomorrow’s may have on the order one million. The applications that run on these systems commonly coordinate their parallel activities via MPI; a trace of these MPI communication events is an important input for tools that visualize, simu...

متن کامل

Reducing Inter-Process Communication Overhead in Parallel Sparse Matrix-Matrix Multiplication

Parallel sparse matrix-matrix multiplication algorithms (PSpGEMM) spend most of their running time on interprocess communication. In the case of distributed matrix-matrix multiplications, much of this time is spent on interchanging the partial results that are needed to calculate the final product matrix. This overhead can be reduced with a one dimensional distributed algorithm for parallel spa...

متن کامل

Study of parallel programming models on computer clusters with Intel MIC coprocessors

Coprocessors based on the Intel Many Integrated Core (MIC) Architecture have been adopted in many highperformance computer clusters. Typical parallel programming models, such as MPI and OpenMP, are supported on MIC processors to achieve the parallelism. In this work, we conduct a detailed study on the performance and scalability of the MIC processors under different programming models using the...

متن کامل

rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

With multicore processors becoming the standard architecture, programmers are faced with the challenge of developing applications that capitalize on multicore’s advantages. This paper presents rMPI, which leverages the onchip networks of multicore processors to build a powerful abstraction with which many programmers are familiar: the MPI programming interface. To our knowledge, rMPI is the fir...

متن کامل

MPI- and CUDA- implementations of modal finite difference method for P-SV wave propagation modeling

Among different discretization approaches, Finite Difference Method (FDM) is widely used for acoustic and elastic full-wave form modeling. An inevitable deficit of the technique, however, is its sever requirement to computational resources. A promising solution is parallelization, where the problem is broken into several segments, and the calculations are distributed over different processors. ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

MPI on a Million Processors

نویسندگان

چکیده

منابع مشابه

PSINS: An Open Source Event Tracer and Execution Simulator for MPI Applications

Reducing Inter-Process Communication Overhead in Parallel Sparse Matrix-Matrix Multiplication

Study of parallel programming models on computer clusters with Intel MIC coprocessors

rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

MPI- and CUDA- implementations of modal finite difference method for P-SV wave propagation modeling

عنوان ژورنال:

اشتراک گذاری